Kleberg County
REMSA: An LLM Agent for Foundation Model Selection in Remote Sensing
Chen, Binger, Bök, Tacettin Emre, Rasti, Behnood, Markl, Volker, Demir, Begüm
Foundation Models (FMs) are increasingly used in remote sensing (RS) for tasks such as environmental monitoring, disaster assessment, and land-use mapping. These models include unimodal vision encoders trained on a single data modality and multimodal architectures trained on combinations of SAR, multispectral, hyperspectral, and image-text data. They support diverse RS tasks including semantic segmentation, image classification, change detection, and visual question answering. However, selecting an appropriate remote sensing foundation model (RSFM) remains difficult due to scattered documentation, heterogeneous formats, and varied deployment constraints. We introduce the RSFM Database (RS-FMD), a structured resource covering over 150 RSFMs spanning multiple data modalities, resolutions, and learning paradigms. Built on RS-FMD, we present REMSA, the first LLM-based agent for automated RSFM selection from natural language queries. REMSA interprets user requirements, resolves missing constraints, ranks candidate models using in-context learning, and provides transparent justifications. We also propose a benchmark of 75 expert-verified RS query scenarios, producing 900 configurations under an expert-centered evaluation protocol. REMSA outperforms several baselines, including naive agents, dense retrieval, and unstructured RAG-based LLMs. It operates entirely on publicly available metadata and does not access private or sensitive data.
- Europe > Austria > Vienna (0.14)
- Asia > Singapore (0.05)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (10 more...)
- Research Report (0.64)
- Workflow (0.47)
CLIRudit: Cross-Lingual Information Retrieval of Scientific Documents
Valentini, Francisco, Kozlowski, Diego, Larivière, Vincent
Cross-lingual information retrieval (CLIR) helps users find documents in languages different from their queries. This is especially important in academic search, where key research is often published in non-English languages. We present CLIRudit, a novel English-French academic retrieval dataset built from Érudit, a Canadian publishing platform. Using multilingual metadata, we pair English author-written keywords as queries with non-English abstracts as target documents, a method that can be applied to other languages and repositories. We benchmark various first-stage sparse and dense retrievers, with and without machine translation. We find that dense embeddings without translation perform nearly as well as systems using machine translation, that translating documents is generally more effective than translating queries, and that sparse retrievers with document translation remain competitive while offering greater efficiency. Along with releasing the first English-French academic retrieval dataset, we provide a reproducible benchmarking method to improve access to non-English scholarly content.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Singapore (0.04)
- (21 more...)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
CODE-II: A large-scale dataset for artificial intelligence in ECG analysis
Abreu, Petrus E. O. G. B., Paixão, Gabriela M. M., Li, Jiawei, Gomes, Paulo R., Macfarlane, Peter W., Oliveira, Ana C. S., Carvalho, Vinicius T., Schön, Thomas B., Ribeiro, Antonio Luiz P., Ribeiro, Antônio H.
Data-driven methods for electrocardiogram (ECG) interpretation are rapidly progressing. Large datasets have enabled advances in artificial intelligence (AI) based ECG analysis, yet limitations in annotation quality, size, and scope remain major challenges. Here we present CODE-II, a large-scale real-world dataset of 2,735,269 12-lead ECGs from 2,093,807 adult patients collected by the Telehealth Network of Minas Gerais (TNMG), Brazil. Each exam was annotated using standardized diagnostic criteria and reviewed by cardiologists. A defining feature of CODE-II is a set of 66 clinically meaningful diagnostic classes, developed with cardiologist input and routinely used in telehealth practice. We additionally provide an open available subset: CODE-II-open, a public subset of 15,000 patients, and the CODE-II-test, a non-overlapping set of 8,475 exams reviewed by multiple cardiologists for blinded evaluation. A neural network pre-trained on CODE-II achieved superior transfer performance on external benchmarks (PTB-XL and CPSC 2018) and outperformed alternatives trained on larger datasets.
- South America > Brazil > Minas Gerais (0.24)
- Europe > Germany (0.04)
- Asia > China > Zhejiang Province > Ningbo (0.04)
- (8 more...)
- Research Report > Experimental Study (1.00)
- Overview (1.00)
- Research Report > New Finding (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Foundation Models in Medical Imaging: A Review and Outlook
van Veldhuizen, Vivien, Botha, Vanessa, Lu, Chunyao, Cesur, Melis Erdal, Lipman, Kevin Groot, de Jong, Edwin D., Horlings, Hugo, Sanchez, Clárisa I., Snoek, Cees G. M., Wessels, Lodewyk, Mann, Ritse, Marcus, Eric, Teuwen, Jonas
Foundation models (FMs) are changing the way medical images are analyzed by learning from large collections of unlabeled data. Instead of relying on manually annotated examples, FMs are pre-trained to learn general-purpose visual features that can later be adapted to specific clinical tasks with little additional supervision. In this review, we examine how FMs are being developed and applied in pathology, radiology, and ophthalmology, drawing on evidence from over 150 studies. We explain the core components of FM pipelines, including model architectures, self-supervised learning methods, and strategies for downstream adaptation. We also review how FMs are being used in each imaging domain and compare design choices across applications. Finally, we discuss key challenges and open questions to guide future research.
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
- North America > United States > Texas > Kleberg County (0.04)
- North America > United States > Texas > Chambers County (0.04)
- (6 more...)
- Overview (1.00)
- Research Report > New Finding (0.45)
- Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Nuclear Medicine (1.00)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (2 more...)
Machine learning-based cloud resource allocation algorithms: a comprehensive comparative review
Cloud resource allocation has emerged as a major challenge in modern computing environments, with organizations struggling to manage complex, dynamic workloads while optimizing performance and cost efficiency. Traditional heuristic approaches prove inadequate for handling the multi-objective optimization demands of existing cloud infrastructures. This paper presents a comparative analysis of state-of-the-art artificial intelligence and machine learning algorithms for resource allocation. We systematically evaluate 10 algorithms across four categories: Deep Reinforcement Learning approaches, Neural Network architectures, Traditional Machine Learning enhanced methods, and Multi-Agent systems. Analysis of published results demonstrates significant performance improvements across multiple metrics including makespan reduction, cost optimization, and energy efficiency gains compared to traditional methods. The findings reveal that hybrid architectures combining multiple artificial intelligence and machine learning techniques consistently outperform single-method approaches, with edge computing environments showing the highest deployment readiness. Our analysis provides critical insights for both academic researchers and industry practitioners seeking to implement next-generation cloud resource allocation strategies in increasingly complex and dynamic computing environments.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- (9 more...)
- Overview (1.00)
- Research Report (0.84)
- Information Technology > Services (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Law (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Assay2Mol: large language model-based drug design using BioAssay context
Deng, Yifan, Ericksen, Spencer S., Gitter, Anthony
Scientific databases aggregate vast amounts of quantitative data alongside descriptive text. In biochemistry, molecule screening assays evaluate candidate molecules' functional responses against disease targets. Unstructured text that describes the biological mechanisms through which these targets operate, experimental screening protocols, and other attributes of assays offer rich information for drug discovery campaigns but has been untapped because of that unstructured format. We present Assay2Mol, a large language model-based workflow that can capitalize on the vast existing biochemical screening assays for early-stage drug discovery. Assay2Mol retrieves existing assay records involving targets similar to the new target and generates candidate molecules using in-context learning with the retrieved assay screening data. Assay2Mol outperforms recent machine learning approaches that generate candidate ligand molecules for target protein structures, while also promoting more synthesizable molecule generation.
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Texas > Kleberg County (0.04)
- (5 more...)
Generative AI as a Linguistic Equalizer in Global Science
Filimonovic, Dragan, Rutzer, Christian, Macher, Jeffrey, Weder, Rolf
These authors contributed equally to this work. For decades, the dominance of English has created a substantial barrier in global science, disadvantaging non-native speakers. The recent rise of generative AI (GenAI) offers a potential technological response to this long-standing inequity. We provide the first large-scale evidence testing whether GenAI acts as a linguistic equalizer in global science. Drawing on 5.65 million scientific articles published from 2021 to 2024, we compare GenAI-assisted and non-assisted publications from authors in non-English-speaking countries. Using text embeddings derived from a pretrained large language model (SciBERT), we measure each publication's linguistic similarity to a benchmark of scientific writing from U.S.-based authors and track stylistic convergence over time. We find significant and growing convergence for GenAI-assisted publications after the release of ChatGPT in late 2022. The effect is strongest for domestic coauthor teams from countries linguistically distant from English. These findings provide large-scale evidence that GenAI is beginning to reshape global science communication by reducing language barriers in research. The rapid rise of generative AI (GenAI) has sparked an important debate regarding its role in science--raising questions of whether it homogenizes writing and erodes authorship norms (1,2) or whether it acts as a "linguistic equalizer" that lowers barriers for non-native English speakers (3,4). This debate is especially salient because English has long dominated global science, which gives native speakers a structural advantage (5-7) by creating larger writing burdens and unique peer review bias risks for researchers from non-Anglophone countries (8-12). As a result, many of these researchers have historically spent time in the U.S. or the UK to learn how to write in English or have hired (expensive) language experts (13, 14). Against this backdrop, the release of ChatGPT in late 2022, a chatbot based on a large language model (LLM), marked a turning point. This widely accessible, low-cost, and human-like tool offers a potential means of reducing longstanding linguistic imbalances (15, 16).
- Europe > Switzerland > Basel-City > Basel (0.04)
- Asia > South Korea (0.04)
- Oceania > New Zealand (0.04)
- (15 more...)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
AI Bill of Materials and Beyond: Systematizing Security Assurance through the AI Risk Scanning (AIRS) Framework
Nathanson, Samuel, Lee, Alexander, Kieffer, Catherine Chen, Junkin, Jared, Ye, Jessica, Saeed, Amir, Lockhart, Melanie, Fink, Russ, Peterson, Elisha, Watkins, Lanier
Assurance for artificial intelligence (AI) systems remains fragmented across software supply-chain security, adversarial machine learning, and governance documentation. Existing transparency mechanisms - including Model Cards, Datasheets, and Software Bills of Materials (SBOMs) - advance provenance reporting but rarely provide verifiable, machine-readable evidence of model security. This paper introduces the AI Risk Scanning (AIRS) Framework, a threat-model-based, evidence-generating framework designed to operationalize AI assurance. The AIRS Framework evolved through three progressive pilot studies - Smurf (AIBOM schema design), OPAL (operational validation), and Pilot C (AIRS) - that reframed AI documentation from descriptive disclosure toward measurable, evidence-bound verification. The framework aligns its assurance fields to the MITRE ATLAS adversarial ML taxonomy and automatically produces structured artifacts capturing model integrity, packaging and serialization safety, structural adapters, and runtime behaviors. Currently, the AIRS Framework is scoped to provide model-level assurances for LLMs, but it could be expanded to include other modalities and cover system-level threats (e.g. application-layer abuses, tool-calling). A proof-of-concept on a quantized GPT-OSS-20B model demonstrates enforcement of safe loader policies, per-shard hash verification, and contamination and backdoor probes executed under controlled runtime conditions. Comparative analysis with SBOM standards of SPDX 3.0 and CycloneDX 1.6 reveals alignment on identity and evaluation metadata, but identifies critical gaps in representing AI-specific assurance fields. The AIRS Framework thus extends SBOM practice to the AI domain by coupling threat modeling with automated, auditable evidence generation, providing a principled foundation for standardized, trustworthy, and machine-verifiable AI risk documentation.
- North America > United States > Texas > Kleberg County (0.04)
- North America > United States > Texas > Chambers County (0.04)
The Persistence of Cultural Memory: Investigating Multimodal Iconicity in Diffusion Models
Palmini, Maria-Teresa De Rosa, Cetinic, Eva
Our work addresses the ambiguity between generalization and memorization in text-to-image diffusion models, focusing on a specific case we term multimodal iconicity. This refers to instances where images and texts evoke culturally shared associations, such as when a title recalls a familiar artwork or film scene. While prior research on memorization and unlearning emphasizes forgetting, we examine what is remembered and how, focusing on the balance between recognizing cultural references and reproducing them. W e introduce an evaluation framework that separates recognition, whether a model identifies a reference, from realization, how it depicts it through replication or reinterpretation, quantified through measures capturing both dimensions. By evaluating five diffusion models across 767 Wikidata-derived cultural references spanning static and dynamic imagery, we show that our framework distinguishes replication from transformation more effectively than existing similarity-based methods. T o assess linguistic sensitivity, we conduct prompt perturbation experiments using synonym substitutions and literal image descriptions, finding that models often reproduce iconic visual structures even when textual cues are altered. Finally, our analysis shows that cultural alignment correlates not only with training data frequency, but also textual uniqueness, reference popularity, and creation date. Our work reveals that the value of diffusion models lies not only in what they reproduce but in how they transform and recontextualize cultural knowledge, advancing evaluation beyond simple text-image matching toward richer contextual understanding.
- Europe > Switzerland > Zürich > Zürich (0.40)
- North America > United States > Texas > Kleberg County (0.04)
- North America > United States > Texas > Chambers County (0.04)
- (5 more...)
- Media (0.93)
- Leisure & Entertainment (0.67)
Cormorant: Covariant Molecular Neural Networks
Brandon Anderson, Truong Son Hy, Risi Kondor
We propose Cormorant, a rotationally covariant neural network architecture for learning the behavior and properties of complex many-body physical systems. We apply these networks to molecular systems with two goals: learning atomic potential energy surfaces for use in Molecular Dynamics simulations, and learning ground state properties of molecules calculated by Density Functional Theory. Some of the key features of our network are that (a) each neuron explicitly corresponds to a subset of atoms; (b) the activation of each neuron is covariant to rotations, ensuring that overall the network is fully rotationally invariant. Furthermore, the non-linearity in our network is based upon tensor products and the Clebsch-Gordan decomposition, allowing the network to operate entirely in Fourier space. Cormorant significantly outperforms competing algorithms in learning molecular Potential Energy Surfaces from conformational geometries in the MD-17 dataset, and is competitive with other methods at learning geometric, energetic, electronic, and thermodynamic properties of molecules on the GDB-9 dataset.
- North America > United States > Texas > Kleberg County (0.04)
- North America > United States > Texas > Chambers County (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Canada (0.04)